Context-aware augmented reality object commands

ABSTRACT

Embodiments are disclosed that relate to operating a user interface on an augmented reality computing device comprising a see-through display system. For example, one disclosed embodiment includes receiving a user input selecting an object in a field of view of the see-through display system, determining a first group of commands currently operable based on one or more of an identification of the selected object and a state of the object, and presenting the first group of commands to a user. The method may further include receiving a command from the first group of commands, changing the state of the selected object from a first state to a second state in response to the command, determining a second group of commands based on the second state, where the second group of commands is different than the first group of commands, and presenting the second group of commands to the user.

BACKGROUND

An augmented reality computing device may display virtual objectsoverlaid on an environment that includes real objects. As such, a userof an augmented reality computing device may interact with both real andvirtual objects during an augmented reality experience. In someinstances, different objects may have different associated computingdevice actions that may be performed when interacting with the objects.Further, the commands available to any object may change depending upona state of the object. As such, a user may have to learn a relativelylarge and complex set of interactions to navigate the augmented realityexperience.

SUMMARY

Embodiments for operating a user interface on an augmented realitycomputing device comprising a see-through display system are disclosed.For example, one disclosed embodiment provides a method includingreceiving a user input selecting an object in a field of view of thesee-through display system, determining a first group of commandscurrently operable based on one or more of an identification of theselected object and a state of the object, and presenting the firstgroup of commands to a user. The method may further include receiving acommand from the first group of commands, changing the state of theselected object from a first state to a second state in response to thecommand, determining a second group of commands based on the secondstate, where the second group of commands is different than the firstgroup of commands, and presenting the second group of commands to theuser.

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used to limit the scope of the claimed subject matter. Furthermore,the claimed subject matter is not limited to implementations that solveany or all disadvantages noted in any part of this disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a schematic depiction of an example use environment for anaugmented reality computing system.

FIG. 2 schematically shows an example embodiment of an augmented realitydisplay device.

FIG. 3 is a block diagram of the display device of FIG. 3.

FIGS. 4-5 shows example interactions of an augmented reality displaydevice user with objects in an augmented reality environment.

FIG. 6 shows example context-aware speech recognition keyword groups.

FIG. 7 shows an example method for operating a user interface on anaugmented reality computing device in accordance with the disclosure.

FIG. 8 schematically shows an example computing system.

DETAILED DESCRIPTION

A see-through display system, e.g., a head-mounted display (HMD) deviceor other near-eye display device, may be used to present an augmentedreality view of a background scene to a user. Such an augmented realityenvironment may include various virtual and real objects with which auser may interact via user inputs, such as speech inputs, gestureinputs, eye tracking inputs, motion inputs, and/or any other suitableinput type. As a more specific example, a user may utilize a speechinput to perform a command associated with a selected object in theaugmented reality environment.

As different objects may have different associated commands, and as thecommands associated with an object may change with changes in a state ofthe object, developing knowledge of user interface commands availablefor various objects may take time and effort.

Accordingly, embodiments are disclosed herein that relate tofacilitating interactions with objects in an augmented reality computingdevice environment. Briefly, the disclosed embodiments relate topresenting, to a user, representations of commands that are availablefor an object based upon an object identity, and also potentially basedon a state of the object. The representations are configured to presentjust those commands that are currently available so that a user canclearly see the available interactions at that time.

FIG. 1 shows an example embodiment of a use environment for anembodiment of a head-mounted display device 104, wherein environment 100takes the form of a living room. A user 106 is viewing the living roomthrough an augmented reality computing device in the form of thesee-through head-mounted display device 104, and may interact with theaugmented environment via a user interface of display device 104. FIG. 1also depicts a user field of view 102 comprising a portion of theenvironment viewable through the display device 104, and thus theportion of the environment that may be augmented with images displayedvia the display device 104. The augmented environment may include aplurality of real and virtual objects, e.g., media devices, with whichthe user may interact. For example, the augmented environment mayinclude a virtual television device 111, a real telephone device 113,and a real media player device 115. Each of these objects in theaugmented environment may be selectable by user 106 so that the user mayperform actions on the selected object. In some embodiments, the userfield of view 102 may be substantially coextensive with the user'sactual field of vision, while in other embodiments the user field ofview 102 may occupy a lesser portion of the user's actual field ofvision.

As will be described in greater detail below, display device 104 maycomprise one or more outwardly facing image sensors (e.g.,two-dimensional cameras and/or depth cameras) configured to acquireimage data (e.g. color/grayscale images, depth images/point cloud data,etc.) representing user environment 100 as the user navigates theenvironment. This image data may be used to obtain information regardingthe layout of the environment (e.g., three-dimensional surface map,etc.) and objects contained therein, such as bookcase 108, sofa 114,desk 117, telephony device 113, and music or media device 115.

Display device 104 may overlay one or more virtual images or objectsover real objects in the user field of view 102. Example virtual objectsdepicted in FIG. 1 include a bowl 116, a virtual television device 111,and a contact list 119 displayed adjacent to telephony device 113. Thevirtual objects and images may be displayed in three-dimensions suchthat the objects appear at different depths to user 106 within the userfield of view 102. The virtual objects displayed by display device 104may be visible only to user 106, and may move as user 106 moves, or mayremain in a set location regardless of movement of user 106.

A user, e.g., user 106, of an augmented reality user interface may beable to perform any suitable actions on real and virtual objects in anaugmented reality environment. However, as mentioned above, theinteractions available to the user may depend on the object type and/orobject state. For example, if the selected object is a virtualtelevision device 111, then commands available for virtual televisiondevice 111 may depend on a state of television device 111. As a morespecific example, if virtual television device 111 is playing a movie,then commands such as “pause,” “forward,” and “rewind” may be availablefor the user to select in that state, whereas commands such as “play”may not be available. Likewise, if virtual television device 111 is notplaying a movie when selected, then commands such as “play,” “forward,”and “rewind” may be available for the user to select whereas othercommands, such as “pause,” may not be available for selection.

Since the list of actions may vary between objects and also over time asan object changes state, it may be difficult for a user to know whichactions are available in a current context of a selected object. Thus,according to embodiments disclosed herein, augmented reality imagesdisplayed through display device 104, potentially in combination withinformation regarding use environment 100, may be used to presentrepresentations of commands associated with user-selected objects ondisplay device 104. For example, upon selection by user 106 of one ormore real or virtual objects in the augmented environment, displaydevice 104 may present one or more augmented reality features thatindicate possible actions which may be performed on the selectedobjects.

User 106 may select an object for interaction in any suitable mannerdetectable by display device 104. For example, user 106 may select anobject by gazing at the object, wherein a gaze direction may be detectedby one or more eye tracking sensors, potentially for a threshold amountof time. In another example, user 106 may select an object by moving hisor her head, hand, entire body, etc., to or towards the object, asdetected by inertial motion sensors and or image sensors. A user alsomay issue voice commands detected by one or more microphones. Additionaldetails regarding detecting user input commands via sensors of displaydevice 104 are discussed below with respect to FIGS. 2-3.

FIG. 1 shows example gaze lines directed at various objects in theaugmented reality environment, wherein the gaze lines represent the userfocusing on the object at which the gaze lines intersect to select theobject. For example, user focus 130 may indicate that user 106 desiresto select music or media device 115, user focus 132 may indicate thatuser 106 desires to select telephony device 113, and user focus 134 mayindicate that user 106 desires to select virtual television device 111.In some examples, a user may only select a single object in theaugmented reality environment in order to perform an action on theobject. In other examples, a user may select a plurality of objects inthe augmented reality environment in order to perform actions on eachobject in the plurality of objects. For example, user 106 may selectboth media device 115 and virtual television device 111 in order toperform a command to reduce a volume of both devices. As anotherexample, a global action may be performed on multiple objects in theenvironment when no objects are selected by a user. For example, if auser is looking out towards no particular object and says “raisevolume,” the application, operating system or other global softwarerecognizes that no objects have focus and may raise a global setting forvolume on all objects that recognize a “raise volume” command in acurrent state.

As remarked above, since commands available for selected objects maydepend on the type of objects selected, the state of the objectsselected, and the number of objects selected, it may be difficult for auser to remember or keep track of possible available actions which maybe performed on selected objects. Thus, as described in more detailbelow with regard to FIGS. 4-7, a current set of available actions basedon the type, state, and number of user-selected objects may be presentedto the user in order to assist the user in effectively utilizingavailable commands for a given context.

A see-through display device according to the present disclosure maytake any suitable form, including but not limited to near-eye devicessuch as the head-mounted display device 104 of FIG. 1. FIG. 2 shows anexample embodiment of a see-through display system 300, and FIG. 3 showsa block diagram of the display system 300.

Display system 300 comprises one or more lenses 302 that form a part ofa display subsystem 304, such that images may be displayed via lenses302 (e.g. via projection onto lenses 302, waveguide system(s)incorporated into lenses 302, and/or in any other suitable manner).Display system 300 further comprises one or more outward-facing imagesensors 306 configured to acquire images of a background scene and/orphysical space being viewed by a user, and may include one or moremicrophones 308 configured to detect sounds, such as voice commands froma user. Outward-facing image sensors 306 may include one or more depthsensors and/or one or more two-dimensional image sensors.

Display system 300 may further comprise a gaze detection subsystem 310configured to detect a direction of gaze of each eye of a user, asdescribed above. Gaze detection subsystem 310 may be configured todetermine gaze directions of each of a user's eyes in any suitablemanner. For example, in the depicted embodiment, gaze detectionsubsystem 310 comprises one or more glint sources 312, such as infraredlight sources, configured to cause a glint of light to reflect from eacheyeball of a user, and one or more image sensors 314, such asinward-facing sensors, configured to capture an image of each eyeball ofthe user. Changes in the glints from the user's eyeballs and/or alocation of a user's pupil as determined from image data gathered viathe image sensor(s) 314 may be used to determine a direction of gaze.Further, a location at which gaze lines projected from the user's eyesintersect the external display may be used to determine an object atwhich the user is gazing (e.g. a displayed virtual object and/or realbackground object). Gaze detection subsystem 310 may have any suitablenumber and arrangement of light sources and image sensors.

Display system 300 may further comprise additional sensors. For example,display system 300 may comprise a global positioning (GPS) subsystem 316to allow a location of the display system 300 to be determined. This mayallow a user of display system 300 to be presented different groups ofcommands associated with selected objects at different locations, forexample.

Display system 300 may further include one or more motion sensors 318 todetect movements of a user's head when the user is wearing displaysystem 300. Motion data may be used, potentially along with eye-trackingglint data and outward-facing image data, for gaze detection, as well asfor image stabilization to help correct for blur in images from theoutward-facing image sensor(s) 306. The use of motion data may allowchanges in gaze location to be tracked even if image data fromoutward-facing image sensor(s) 306 cannot be resolved Likewise, motionsensors 318, as well as microphone(s) 308 and gaze detection subsystem310, also may be employed as user input devices, such that a user mayinteract with the display system 300 via gestures of the eye, neckand/or head, as well as via verbal commands. It will be understood thatsensors illustrated in FIGS. 3 and 4 are shown for the purpose ofexample and are not intended to be limiting in any manner, as any othersuitable sensors and/or combination of sensors may be utilized.

Display system 300 further comprises a controller 320 having a logicsubsystem 322 and a data holding subsystem 324 (which also may bereferred to as a storage system) in communication with the sensors, gazedetection subsystem 310, and display subsystem 304. Data holdingsubsystem 324 comprises instructions stored thereon that are executableby logic subsystem 322, for example, to receive and interpret inputsfrom the sensors, to identify movements of a user, to detect userselection of one or more objects, and to perform actions or commands onselected objects, among other tasks.

It will be appreciated that the depicted display devices 104 and 300 aredescribed for the purpose of example, and thus are not meant to belimiting. It is to be understood that the display device may includeadditional and/or alternative sensors, cameras, microphones, inputdevices, output devices, etc. than those shown without departing fromthe scope of this disclosure. Further, the physical configuration of adisplay device and its various sensors and subcomponents may take avariety of different forms without departing from the scope of thisdisclosure.

FIG. 4 shows example embodiments of interactions at 402, 404, 406, and408 of a user 106 wearing a see-through display device 104 with objectsin a field of view 102 of the see-through display device 104. By way ofexample, FIG. 4 shows a first object 410 of a first type and a secondobject 414 of a second type in an augmented reality environment. Firstobject 410 and second object 414 may be any suitable objects within anaugmented reality environment, and may be virtual objects or realobjects. For example, first object 410 may be a virtual television, suchas virtual television device 111 shown in FIG. 1, overlaid onto abackground scene whereas second object 414 may be a media device such asdevice 115 of FIG. 1, or a telephony device such as device 113 ofFIG. 1. As described in more detail below with regard to FIG. 6, eachobject in an augmented reality environment may have different groups ofcommands associated with it, depending on the type of object and stateof the object.

At 402, FIG. 4 shows user 106 gazing at or focusing on first object 410as illustrated via gaze lines 418 in order to select the first object.For example, user 106 may look in a direction of the object and sensorsin the augmented reality computing system, e.g., an eye-tracking sensor,may be used to detect that the user is focusing on the first object inorder to select the first object. As other examples, head-trackingsensors and/or a hand-tracking sensors may be used to determine a focusof a user on an object. In some examples, user 106 may focus on a targetregion 412 of the first object in order to activate a display of a menuof available actions or commands associated with the first object in itscurrent state. Target region 412 may have any suitable appearance. Forexample, target region 412 may comprise an icon, a glowing particle, orother image or feature displayed at a location at or adjacent to thefirst object, e.g., in an upper right hand corner of first object 410.

First object 410 may be a television device, such a virtual televisiondevice 111 shown in FIG. 1. If user 106 selects the television device,e.g. by focusing on the device, while the television device is playing amovie, then currently available commands available for the televisiondevice may include commands such as “pause” and “adjust volume” whereascommands not available for the device in its current state may include“play.” Likewise, if user 106 selects the television device while amovie is paused on the device, then available commands for thetelevision device in its current state may include a “play” command, butnot a “pause” command or an “adjust volume” command.

As another example, first object 410 may represent a telephone device,e.g., telephony device 113 shown in FIG. 1. If a user selects thetelephone while the telephone is ringing, e.g., to answer or decline acall, then available commands for the device in its current state mayinclude “answer” or “ignore” whereas commands such as “call a contact”may not be available. However, if user 106 selects the telephone objectduring a condition when an incoming call is not occurring then commandssuch as “answer” or “ignore” may not be available as commands for thetelephone whereas commands available for the device in its current statemay include “call a contact.”

After the first object 410 is selected by the user and/or after thetarget region 412 is gazed upon by the user, visibility ofrepresentations of commands in a first group of commands representingcurrently available commands may be triggered to display to the user acurrent set of available actions associated with the selected object. Anexample of such representations is shown at 404 in FIG. 4. In thisspecific example, representations 422 of groups of commands aredisplayed to the user at a location adjacent to the first object via thesee-through display device 104. As another example, representations ofgroups of commands may be may be overlaid over the background sceneadjacent to the selected object, as shown at 424. For example, therepresented group of commands could be displayed on a virtual objectthat is mapped to align with the surface of a real world object, e.g., awall.

This presentation of available commands for a selected object may betriggered in a variety of ways. For example, the presentation ofcommands may be triggered in response to user 106 gazing or focusing onthe selected object for a threshold time duration, e.g., at least onesecond. As another example, a menu of available commands may beactivated and presented to the user in response to a degree of userstability as detected via eye tracking sensors and/or motion sensors.For example, if user 106 stands still and looks at first object 410 withminimal changes to body position and view direction for a threshold timeduration, then menu visibility may be enabled. As another example, acommand menu may be activated in response to user 106 gazing or focusingon a secondary embedded focus target, e.g., target region 412. Forexample, user 106 may look at an icon or glowing particle clusteroverlaid on the object, enabling menu visibility. As another example,presentation of commands associated with a selected object may beactivated by speech input received from user 106. For example, user 106may speak a key phrase, such as “help me” to activate a display ofcommands associated with the selected object.

As still another example, presentation of commands associated with theselected object may be activated by object/object proximity. Forexample, a presentation of a group of commands for an object may beactivated in response to a change in distance between the selectedobject and another object in the environment. As a more specificexample, an “Actions” holographic object may be floating in space oroverlaid on the environment. User 106 may look at first object 410,which may for example be a floating TV object, and moves it closer tothe “Actions” object. As a result of this movement, a menu may emergefrom the “Actions” object that enables the player to change the channelor perform other TV-specific actions.

As another example, as shown in FIG. 5, presentation of commandsassociated with the selected object may be activated by user/objectproximity. For example, with reference to FIG. 5, after user 106 looksat object 508, a menu may begin to be partially unrolled out of the sideof the object, for example. As the user walks closer to the object, themenu may continue unrolling until it is completely visible to the user.

For example, as shown in FIG. 5 at 502, when user 106 is a firstdistance 512 away from an object 508, user 106 may select object 508 bygazing at or focusing on object 508 with a focus 510. However, at firstdistance 512, user 106 may be too far away from object 508 for somecommands associated with the object to be applicable. Thus, even thoughuser 106 selects object 508 at 502, no commands may be presented to theuser. Next, as the user moves closer to object 508, a first group ofcommands available for the object may be displayed to the user. This isshown at 504, where, as user 106 moves toward object 508 to a seconddistance 514 from object 508 that is less than first distance 512, afirst group of available commands 518 is presented to user 106. Forexample, if object 508 is a television device, then at distance 514,user 106 may be close enough to the device to perform certain actions,such as play or pause, but may still be too far away to perform otheractions on the device, such as select interactive objects within animage displayed on the television object. As the user moves even closerto object 508, even more commands may become available and may bepresented to the user. This is shown at 506, where, as user 106 moves toa third distance 516 less than the second distance 514 from the object508, a second group of available commands 520 is presented to user 106.The second group of commands 520 may includes a greater number ofcommands than the first group of commands 518, a lesser number ofcommands, or a same number of different commands, or a same number ofsame commands. For example, if object 508 is a television device thenthe second group of commands 520 may include the first group of commands518, e.g., play and pause, plus additional available commands, e.g.,that regard interaction with an in-picture object.

Various presentation methods, both non-interactive and interactive, maybe used to present a group of commands associated with a selected objectto the user. In some examples, presenting the group of commands to theuser may include displaying, via the see-through device 104,representations of commands at a location adjacent to the selectedobject 410 as shown at 422 in FIG. 4. As another example,representations of commands associated with the selected object may beoverlaid on one or more regions in the augmented reality environment.For example, representations of commands may be overlaid on a surface ofa wall adjacent to the selected object 410 as shown at 424.

The representations of commands displayed to user 106 may take anysuitable form. For example, the representations may include icons,floating three-dimensional signs, lists of available command options,and other indications of available commands. As a more specific example,a text menu of available actions may be wrapped onto the surfaces ofreal-world walls, e.g., as discovered with surface reconstructiontechnology. Available actions for a selected object also may bedisplayed on selected objects directly, and may use standardizediconography. For example, a frame around a floating TV object mightcontain a speaker-image icon that lights up when audio-related actionsare available to the user. As another example, audio representations maybe presented to the user indicating available commands for a selectedobject. Further, a speech audio output may be presented to the userdescribing which actions are available for a selected object in itscurrent state.

After being presented the first group of available commands for selectedobject 410, user 106 may then provide user input selecting one of thecommands from the first group of commands. Examples of user input forselecting a command include but are not limited to speech, touch, gaze,gesture, posture, or any combination of these. Where users may selectcommands via speech, a plurality of key-phrases may be associated witheach command so that, in response to a user speech input of a key-phraseassociated with a command, the command corresponding to the key-phraseis performed on the selected object. This is described in more detailbelow with regard to FIG. 6.

In some examples, performing the command on the first object 410 maychange the state of the first object, e.g., from a first state to asecond state. For example, at 404, FIG. 4 shows first object 410 in afirst state, and at 406, in a second state after executing the commandin response to the user input. As a more specific example, if firstobject 410 is a television device in a paused video state when selectedat 402 and the user inputs a command of “play,” then the second state offirst object 410 may be a playing video state at 406.

When the first object changes state, a second group of commands may beprovided to the user, as shown by the difference between user interfaces422 and 426 in FIG. 4. The second group of commands may be differentfrom the first group of commands, as commands available to an object maychange as a state of the object changes.

FIG. 4 further shows, at 408, user 106 after ceasing gazing at firstobject 410 so that first object 410 is no longer a selected object.After user 106 deselects first object 410, first object 410 may remainin its second state, or may change to a different state, as user 106focuses on and selects a second object 414, as indicated via gaze lines430. After user 106 selects second object 414, then representations ofgroups of commands associated with the second object are displayed touser 106. These representations may be displayed adjacent to secondobject 414 as shown at 434, overlaid on other objects in theenvironment, e.g., on a wall adjacent to second object 414 as shown at436, or displayed in any other suitable manner.

In some examples, user 106 may select multiple objects, e.g., two ormore objects, in an environment and perform actions on the selectedobjects. For example, user 106 may select both first object 410 andsecond object 414 via one or more inputs such as gazing at the objects,speech input, gesture input, combinations thereof, and/or any othersuitable input. In this instance, specific groups of commands formultiple objects may be different than groups of commands available whenone object is selected. For example, if first object 410 is a televisiondevice playing a video and second object 414 is a media device playingmusic, and user 106 selects both the first and second objects, then thegroup of commands presented to the user may include volume controlcommands to globally adjust the volume of both devices. Such groups ofcommands related to global behaviors and/or multiple selected objectsmay be activated by a higher-level application such as via an operatingsystem of the augmented reality computing device, or at an applicationlevel of an application running on the augmented reality computingdevice. In contrast, groups of commands specialized to per-objectbehaviors may be activated within each object during state transitionsof the object. For example, per-object groups of commands may changealong with the object as its properties and capabilities change overtime.

It may be desirable for applications which use speech recognitionsystems to register a minimal number of speech commands, in order toreduce false positives where a user accidentally triggers an undesiredaction, and to ensure that speech keywords are valid in a currentcontext.

However, in other examples, for each command in a group of commandsassociated with selected objects, synonyms for speech keywords may beused to enable an application to map multiple phrases to a singlecommand. In this way, a user might be more likely to successfullyremember a speech command if several synonyms are available for the samecommand since the user simply need to remember one of them.

For example, as remarked above, in some examples, for each command in agroup of commands associated with selected objects, a plurality of keyphrases may be associated with the command. In response to receivingspeech input from a user of a selected key phrase in the plurality ofkey phrases, the command corresponding to the selected key phrase may beperformed. In this way, complexity and performance costs associated withan augmented reality computing device with different command groups fordifferent types and states of objects may be reduced, leading to a morescalable system for both activating action menus for selected objectsand triggering actions within a menu.

FIG. 6 shows example context-aware speech recognition key phrase groupsfor a first object type 602 and a second object type 604 duringdifferent states. For example, first object type 602 may have a firststate 606 and a first group of commands 614 associated with the firststate 606. The first group of commands 614 may include a plurality ofcommands, e.g., command 1 through command N, which are recognizable by aselected object with the first object type in the first state. Eachcommand in the first group of commands 614 may have a plurality of keyphrases associated with the command. For example, command 1 has keyphrase 1 through key phrase N associated with it.

The first object type 602 may also have a second state 608 and a secondgroup of commands 616 associated the second state 608. The second groupof commands 616 may include a plurality of commands, e.g., command 1through command N, which are recognizable by a selected object with thefirst object type in the second state. Each command in the second groupof commands 616 may have a plurality of key phrases associated with thecommand. For example, command 1 has key phrase 1 through key phrase Nassociated with it.

A similar system may be used for a second object type 604. For example,second object type 604 may have a first state 610 and a first group ofcommands 618 associated the first state 610. The first group of commands618 may include a plurality of commands, e.g., command 1 through commandN, which are recognizable by a selected object with the second objecttype in the first state. Each command in the first group of commands 618may have a plurality of key phrases associated with the command.

The second object type 604 may also have a second state 612 and a secondgroup of commands 620 associated the second state 612. The second groupof commands 620 may include a plurality of commands, e.g., command 1through command N, which are recognizable by a selected object with thesecond object type in the second state. Each command in the second groupof commands 620 may have a plurality of key phrases associated with thecommand.

In some examples, these key phrases may be handled by a speechrecognition system at an application level for an application running onan augmented reality computing device or at an operating system level ofan augmented reality computing device. For example, a user may providespeech input of one of the key phrases associated with a command in thefirst group of commands and the application, the operating system, orthe speech recognition system may interpret the speech input of a keyphrase into the associated command and send the associated command tothe user selected objects.

As a more specific example, in an environment containing a floatingholographic television object selected by a user, the speech recognitionsystem may associate the key phrases “load video,” “play movie,” and“start” with a “play” command recognizable by the television object. Forexample, the television object may not recognize the key phrases “loadvideo,” “play movie,” and “start” but may only recognize the command“play.” In this example, the speech recognition system may translate thekey phrases “load video,” “play movie,” and “start” into the command“play” sent to the television object to perform the play command.

Further, key phrases associated with commands may be selectivelyactivated or deactivated in response to a change in context, changes inuser selections of objects, and changes in states of the selectedobjects. For example, individual objects may enable or disable groups ofkey phrases as they go through context transitions. As a more specificexample, when looking at a video-playing television object, the objectmay send a request to the speech recognition system to activate a groupof key phrases associated with a particular command based on knowledgeof its own properties, such as type and state. Activating a group of keyphrases enables the speech management system to register all key phrases(and synonyms) that belong to the group of key phrases so that thesystem may be able to recognize each key phrase and perform thecorresponding command. Further, a group of key phrases may bedeactivated in response to a selection of an object and/or othersuitable circumstances so that an inactive action for the selectedobject does not have speech commands registered to it, thus helping todecrease the change of speech recognition false positives. In someexamples, global key phrase groups associated with global commands,e.g., when multiple objects are selected by a user, may be activated ordeactivated at higher levels of the application which have visibilityacross multiple objects and have a better understanding of applicationand current augmented environment context.

FIG. 7 shows an example method 700 for operating a user interface on anaugmented reality computing device comprising a see-through displaysystem and a sensor system including one or more sensors, e.g. includingan eye tracking sensor.

At 702, method 700 includes displaying one or more virtual images suchthat a field of view of a user through the see-through display systemcomprises a set of augmented reality features. The set of augmentedreality features may be based upon a type, content of, and/or propertyof the virtual images and/or real objects. For example, the augmentedreality features may include real or virtual objects selectable by auser. Further, the features also may include sounds or other augmentedreality environment aspects.

At 704, method 700 includes receiving a user input selecting an object.Examples of user inputs may include gaze inputs, voice inputs, gestureinputs, posture inputs, body motion inputs, eye tracking inputs, and/orany other suitable inputs. As one more specific non-limiting example, anobject may be selected in response to determining that the user hasgazed at the selected object for a duration greater than a first timethreshold. This is shown at 706, where method 700 may includedetermining that a user has gazed at the selected object.

At 708, method 700 includes determining a first group of commands thatare operable based on one or more of an identification of the selectedobject and a state of the object. In some examples, for each command inthe first group of commands, a plurality of key phrases may beassociated with the command, so that, in response to receiving speechinput of a selected key phrase in the plurality of key phrases, acommand corresponding to the selected key phrase is performed.

At 712, method 700 includes presenting the first group of commands tothe user. For example, presenting the first group of commands to theuser may include displaying representations of commands in the firstgroup of commands, potentially at a location adjacent to the selectedobject, via the see-through display system. As another example,presenting the first group of commands to the user may includedisplaying, via the see-through display system, representations ofcommands in the first group of commands on a surface of a the selectedobject or a second object, where the second object is different from theselected object.

As mentioned above, various triggering conditions may be used todetermine whether to present the first group of commands to a user. Forexample, at 714, method 700 may include presenting the first group ofcommands to the user if an amount of user movement is less than amovement threshold. As a more specific example, the first group ofcommands may be presented to the user in response to an amount of usermovement less than a movement threshold for threshold duration while theuser focuses on the selected object. Likewise, at 716, method 700 mayinclude presenting the first group of commands to the user if the useris gazing at a target region of the selected object, or other suitabletarget. As another example, the first group of commands may be presentedto the user in response to determining that the user has gazed at theselected object for a time duration greater than a second timethreshold, where the second time threshold is greater than a first timethreshold used in the determination of whether the user is selecting theobject. Further, at 718, method 700 may include presenting the firstgroup of commands to the user in response to a speech input receivedfrom the user.

Further, as mentioned above, distance also may be used as a trigger todisplay commands. Thus, at 720, method 700 may include increasing anumber of commands presented to the user in response to a decrease in auser distance from the selected object. For example, a number ofcommands in the first group of commands presented to the user may beincreased in response to detecting a decrease in distance between theuser and the selected object.

At 722, method 700 includes receiving a command from the first group ofcommands. For example, at 724, method 700 may include receiving speechinput of a key phrase associated with the command. In response, method700 comprises, at 726, performing the command.

Further, at 728, method 700 may include changing a state of the selectedobject. For example, the state of the selected object may be changedfrom a first state to a second state in response to the command. In someexamples, the command received at 722 may be deactivated in response toa change of state of the selected object. In response, at 730, method700 includes determining a second group of commands. The second group ofcommands may be determined based on the second state, where the secondgroup of commands is different than the first group of commands.Further, in some examples, for each command in the second group ofcommands, a plurality of key phrases may be associated with the command,so that in response to receiving speech input of a selected key phrasein the plurality of key phrases, a command corresponding to the selectedkey phrase may be performed.

At 734, method 700 includes presenting the second group of commands tothe user. The second group of commands may be presented to the user, forexample, in response to changing the state of the selected object fromthe first state to the second state, and may be presented in anysuitable manner and at any suitable location.

Thus, according to the above embodiments, a user may be presented withinformation regarding currently available commands operable on aselected object based upon information such as the object type and/orstate. This may facilitate the learning and operation of an augmentedreality user interface.

In some embodiments, the methods and processes described above may betied to a computing system of one or more computing devices. Inparticular, such methods and processes may be implemented as acomputer-application program or service, an application-programminginterface (API), a library, and/or other computer-program product.

FIG. 8 schematically shows a non-limiting embodiment of a computingsystem 800 that can enact one or more of the methods and processesdescribed above. Display device 104 may be one non-limiting example ofcomputing system 800. Computing system 800 is shown in simplified form.It will be understood that virtually any computer architecture may beused without departing from the scope of this disclosure. In differentembodiments, computing system 800 may take the form of a display device,wearable computing device, mainframe computer, server computer, desktopcomputer, laptop computer, tablet computer, home-entertainment computer,network computing device, gaming device, mobile computing device, mobilecommunication device (e.g., smart phone), etc.

Computing system 800 includes a logic subsystem 802 and a storagesubsystem 804. Computing system 800 may optionally include a displaysubsystem 806, input subsystem 808, communication subsystem 710, and/orother components not shown in FIG. 8.

Logic subsystem 802 includes one or more physical devices configured toexecute instructions. For example, the logic subsystem may be configuredto execute instructions that are part of one or more applications,services, programs, routines, libraries, objects, components, datastructures, or other logical constructs. Such instructions may beimplemented to perform a task, implement a data type, transform thestate of one or more components, or otherwise arrive at a desiredresult.

The logic subsystem may include one or more processors configured toexecute software instructions. Additionally or alternatively, the logicsubsystem may include one or more hardware or firmware logic machinesconfigured to execute hardware or firmware instructions. The processorsof the logic subsystem may be single-core or multi-core, and theprograms executed thereon may be configured for sequential, parallel ordistributed processing. The logic subsystem may optionally includeindividual components that are distributed among two or more devices,which can be remotely located and/or configured for coordinatedprocessing. Aspects of the logic subsystem may be virtualized andexecuted by remotely accessible, networked computing devices configuredin a cloud-computing configuration.

Storage subsystem 804 includes one or more physical devices configuredto hold data and/or instructions executable by the logic subsystem toimplement the methods and processes described herein. When such methodsand processes are implemented, the state of storage subsystem 804 may betransformed—e.g., to hold different data.

Storage subsystem 804 may include removable media and/or built-indevices. Storage subsystem 804 may include optical memory devices (e.g.,CD, DVD, HD-DVD, Blu-Ray Disc, etc.), semiconductor memory devices(e.g., RAM, EPROM, EEPROM, etc.) and/or magnetic memory devices (e.g.,hard-disk drive, floppy-disk drive, tape drive, MRAM, etc.), amongothers. Storage subsystem 804 may include volatile, nonvolatile,dynamic, static, read/write, read-only, random-access,sequential-access, location-addressable, file-addressable, and/orcontent-addressable devices.

It will be appreciated that storage subsystem 804 includes one or morephysical devices. However, in some embodiments, aspects of theinstructions described herein may be propagated by a pure signal (e.g.,an electromagnetic signal, an optical signal, etc.) via a communicationsmedium, as opposed to being stored on a storage device. Furthermore,data and/or other forms of information pertaining to the presentdisclosure may be propagated by a pure signal.

In some embodiments, aspects of logic subsystem 802 and of storagesubsystem 804 may be integrated together into one or more hardware-logiccomponents through which the functionally described herein may beenacted. Such hardware-logic components may include field-programmablegate arrays (FPGAs), program- and application-specific integratedcircuits (PASIC/ASICs), program- and application-specific standardproducts (PSSP/ASSPs), system-on-a-chip (SOC) systems, and complexprogrammable logic devices (CPLDs), for example.

When included, display subsystem 806 may be used to present a visualrepresentation of data held by storage subsystem 804. This visualrepresentation may take the form of a graphical user interface (GUI). Asthe herein described methods and processes change the data held by thestorage subsystem, and thus transform the state of the storagesubsystem, the state of display subsystem 806 may likewise betransformed to visually represent changes in the underlying data.Display subsystem 806 may include one or more display devices utilizingvirtually any type of technology. Such display devices may be combinedwith logic subsystem 802 and/or storage subsystem 804 in a sharedenclosure, or such display devices may be peripheral display devices.

When included, input subsystem 808 may comprise or interface with one ormore user-input devices such as a keyboard, mouse, touch screen, or gamecontroller. In some embodiments, the input subsystem may comprise orinterface with selected natural user input (NUI) componentry. Suchcomponentry may be integrated or peripheral, and the transduction and/orprocessing of input actions may be handled on- or off-board. Example NUIcomponentry may include a microphone for speech and/or voicerecognition; an infrared, color, stereoscopic, and/or depth camera formachine vision and/or gesture recognition; a head tracker, eye tracker,accelerometer, and/or gyroscope for motion detection and/or intentrecognition; as well as electric-field sensing componentry for assessingbrain activity.

When included, communication subsystem 810 may be configured tocommunicatively couple computing system 800 with one or more othercomputing devices. Communication subsystem 810 may include wired and/orwireless communication devices compatible with one or more differentcommunication protocols. As non-limiting examples, the communicationsubsystem may be configured for communication via a wireless telephonenetwork, or a wired or wireless local- or wide-area network. In someembodiments, the communication subsystem may allow computing system 800to send and/or receive messages to and/or from other devices via anetwork such as the Internet.

It will be understood that the configurations and/or approachesdescribed herein are exemplary in nature, and that these specificembodiments or examples are not to be considered in a limiting sense,because numerous variations are possible. The specific routines ormethods described herein may represent one or more of any number ofprocessing strategies. As such, various acts illustrated and/ordescribed may be performed in the sequence illustrated and/or described,in other sequences, in parallel, or omitted. Likewise, the order of theabove-described processes may be changed.

The subject matter of the present disclosure includes all novel andnon-obvious combinations and sub-combinations of the various processes,systems and configurations, and other features, functions, acts, and/orproperties disclosed herein, as well as any and all equivalents thereof

1. On an augmented reality computing device comprising a see-throughdisplay system, a method for operating a user interface, the methodcomprising: receiving a user input selecting an object in a field ofview of the see-through display system; determining a first group ofcommands currently operable based on one or more of an identification ofthe selected object and a state of the object; and presenting the firstgroup of commands to a user.
 2. The method of claim 1, furthercomprising: receiving a command from the first group of commands;changing the state of the selected object from a first state to a secondstate in response to the command; determining a second group of commandsbased on the second state, the second group of commands being differentthan the first group of commands; and presenting the second group ofcommands to the user.
 3. The method of claim 1, wherein the userinterface in the augmented reality computing device comprises agraphical user interface displayed via the see-through display systemand a sensor system including one or more sensors.
 4. The method ofclaim 3, wherein the one or more sensors includes at least one of aneye-tracking sensor, a head-tracking sensor, and a hand-tracking sensor.5. The method of claim 1, wherein the user input is a focus of the useron the object.
 6. The method of claim 5, wherein receiving a user inputselecting an object in a field of view of the see-through display systemcomprises determining that a user has gazed at the selected object for atime duration greater than a first time threshold, and whereinpresenting the first group of commands to the user comprises presentingthe first group to the user in response to determining that the user hasgazed at the selected object for a time duration greater than a secondtime threshold, where the second time threshold is greater than thefirst time threshold.
 7. The method of claim 5, wherein presenting thefirst group of commands to the user comprises presenting the first groupof commands in response to an amount of user movement less than amovement threshold for threshold duration while the user focuses on theobject.
 8. The method of claim 1, wherein presenting the first group ofcommands to the user comprises presenting the first group of commands inresponse to determining that a user is gazing at a target region of theselected object.
 9. The method of claim 1, further comprising increasinga number of commands in the first group of commands presented to theuser in response to detecting a decrease in distance between the userand the selected object.
 10. The method of claim 1, further comprisingreceiving a speech input from a user, and presenting the first group ofcommands to the user in response to the speech input received from theuser.
 11. The method of claim 1, further comprising, for each command inthe first group of commands, associating a plurality of key phrases withthe command, and in response to receiving speech input of a selected keyphrase in the plurality of key phrases, performing a commandcorresponding to the selected key phrase.
 12. The method of claim 1,wherein presenting the first group of commands to the user includesdisplaying representations of commands in the first group of commands ata location adjacent to the selected object via the see-through displaysystem.
 13. The method of claim 1, wherein presenting the first group ofcommands to the user includes displaying, via the see-through displaysystem, representations of commands in the first group of commands on asurface of a second object, where the second object comprises one ormore of a real object and a virtual object different from the selectedobject.
 14. On an augmented reality computing device comprising asee-through display system, a method for operating a user interface, themethod comprising: receiving a user input selecting an object in a fieldof view of the see-through display system; determining a first group ofcommands operable based on one or more of an identification of theselected object and a state of the object; presenting the first group ofcommands to a user; receiving a command from the first group ofcommands; changing the state of the selected object from a first stateto a second state in response to the command; determining a second groupof commands based on the second state, the second group of commandsbeing different than the first group of commands; and presenting thesecond group of commands to the user.
 15. The method of claim 14,wherein receiving a user input selecting an object in a field of view ofthe see-through display system comprises determining that a user hasgazed at the selected object and wherein presenting the second group ofcommands to the user comprises presenting the second group of commandsto the user in response to changing the state of the selected objectfrom the first state to the second state.
 16. The method of claim 14,further comprising, for each command in the first and second group ofcommands, associating a plurality of key phrases with the command, andin response to receiving speech input of a selected key phrase in theplurality of key phrases, performing a command corresponding to theselected key phrase and deactivating the command in response to a changeof state of the selected object.
 17. The method of claim 14, whereinpresenting the first group of commands to the user includes displayingrepresentations of commands in the first group of commands at a locationadjacent to the selected object via the see-through display system, andwherein presenting the second group of commands to the user includesdisplaying representations of commands in the second group of commandsat a location adjacent to the selected object via the see-throughdisplay system.
 18. A wearable augmented reality computing device,comprising: a see-through display device; graphical user interfacedisplayed via the see-through display device; a sensor system includingone or more sensors; a logic subsystem; and a storage subsystemcomprising instructions stored thereon that are executable by the logicsubsystem to: receive from the sensor system a user input selecting anobject in a field of view of the see-through display device; determine afirst group of commands operable based on one or more of anidentification of the selected object and a state of the object; andpresent the first group of commands to a user via the graphical userinterface.
 19. The device of claim 18, wherein the storage subsystemcomprising instructions stored thereon are further executable by thelogic subsystem to: receive a command from the first group of commands;change the state of the selected object from a first state to a secondstate in response to the command; determine a second group of commandsbased on the second state, the second group of commands being differentthan the first group of commands; and present the second group ofcommands to the user via the graphical user interface.
 20. The device ofclaim 18, wherein the storage subsystem comprising instructions storedthereon are further executable by the logic subsystem to: in response toa command received when no objects are selected by the user, perform thecommand globally on all objects for which the command received isrecognized.